Predicting recall RT with 2-AFC RT (v2.0B)

March 4 update

  • Add 29 more participants
  • Add simulated pairings analysis

Exclusions

trials = all_trials %>% group_by(wid) %>% filter(mean(correct) > 0.25)
n_exclude = length(unique(all_trials$wid)) - length(unique(trials$wid))
N = length(unique(trials$wid))

nt = nrow(trials)
max_rt = with(trials, mean(rt, na.rm=TRUE) + 5 * sd(rt, na.rm=TRUE))
# trials = filter(trials, rt < max_rt)
n_trial = nrow(trials)
n_drop_rt = nt - n_trial

trials %<>% mutate(
    log_afc_rt = log(afc_rt),
    recall_rt = rt,
    log_recall_rt = log(rt)
)
  • Excluding 0 participants who gave incorrect resonses on fewer than 25% of recall trials, leaving 56 participants in the analysis.

Goal

In the critical phase of our study, we will show participants two images and ask them to enter the word associated with one of them (a “multi-cue recall” task). The primary model prediction is that people will spend more time looking at (trying to remember) the image for which partial recall progresses more quickly.

knitr::include_graphics("../model/figs/fixation_by_strength.png")

To see this prediction in data, we need to be able to manipulate the memorability of different pairs. Initial attempts to do this using pre-existing word memorability scores were not very successful (see Word type effects). Instead, we can measure how well a participant has learned each pair in a preliminary 2-AFC task and use this measure to predict fixations in the multi-cue recall task.

In this experiment, we wanted to verify that this approach might work by asking whether reaction time in the 2-AFC task reliably predicts reaction time in a single cue recall task.

Does 2-AFC RT predict cue-recall RT?

Is the reaction time on 2AFC trials for a given word predictive of reaction time for the same word in the recall task? Critically, we need to see the effect within individuals, so we plot linear fits for each participant and run a mixed effects model with random intercepts.

ggplot(trials, aes(log_afc_rt, log_recall_rt, group=wid)) + 
    geom_point(alpha=0.2) +
    geom_smooth(method=lm, se=F)

trials %>% with(lmer(log_recall_rt ~ log_afc_rt + (1|wid))) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 5.491 0.282 19.471 664.419 0.000
log_afc_rt 0.272 0.039 6.921 697.947 0.000
p values calculated using Satterthwaite d.f.

Looks like we have a consistent effect and reasonably sized effect! We can visualize the individual effects more clearly by clearly by applying participant-wise mean centering:

trials %<>% group_by(wid) %>% mutate(
    centered_log_recall_rt = log_recall_rt - mean(log_recall_rt),
    centered_log_afc_rt = log_afc_rt - mean(log_afc_rt),
) %>% ungroup()

ggplot(trials, aes(centered_log_afc_rt, centered_log_recall_rt, group=wid)) + 
    geom_smooth(method=lm, se=F)

And here is the distribution of slopes (with 95% confidence intervals) and correlations for each participant.

effects = trials %>% 
    nest(-wid) %>% 
    mutate(
        fit = map(data, ~ 
            lm(log_recall_rt ~ log_afc_rt, data=.) %>% 
            tidy(conf.int = T)
        )
    ) %>% 
    unnest(fit) %>% 
    filter(term == 'log_afc_rt') %>% 
    arrange(estimate)

notick = theme(
  axis.text.x = element_blank(),
  axis.text.y = element_blank(),
  axis.ticks = element_blank())

corrs = trials %>%
    group_by(wid) %>% 
    summarise(cor=cor(log_recall_rt, log_afc_rt, use="complete"))

effects = left_join(effects, corrs)

p1 = ggplot(effects, aes(reorder(wid, estimate), estimate)) + 
    geom_point() +
    geom_errorbar(aes(ymin=conf.low , ymax=conf.high)) +
    geom_hline(yintercept=0, color="red") +
    labs(y="slope", x="participant") +
    coord_flip() + notick


p2 = ggplot(effects, aes(reorder(wid, estimate), cor)) + 
        geom_point() +
        geom_hline(yintercept=0, color="red") + 
        labs(y="correlation", x="participant") +
        coord_flip() + notick

prop_slope_positive = with(effects, mean(estimate > 0))

p1 | p2

73% of participants have a positive slope.

Simulated pairings

Another way to analyze this data is to look at what the RT differences would have been if we had paired the items as we plan to do in the critical experiment: slowest with fastest, second slowest with second fastest, and so on. After much exploration, we found that we get the best results when we ignore the AFC accuracy and throw out the first round (leaving just two RT measurements per word).

make_pairs = function(filt, score) {
    afc %>%
        filter({{ filt }}) %>% 
        mutate(score={{ score }}) %>% 
        group_by(wid, word) %>% 
        summarise(score=mean(score)) %>% 
        group_by(wid) %>% 
        mutate(score  = score - median(score)) %>% 
        arrange(score, .by_group=T) %>% 
        mutate(kind=rep(c('easy', 'hard'), each=10), pair=c(10:1, 1:10)) %>% 
        left_join(select(trials, wid, word, recall_rt=rt))
}

make_diff = function(pairs) {
    pairs %>% 
        pivot_wider(c(wid, pair), values_from=recall_rt, names_from=kind) %>%
        mutate(diff = log(hard) - log(easy))
}

pairs = make_pairs(round > 1, log(rt))
p1 = ggplot(pairs, aes(pair, log(recall_rt), color=kind)) +
    stat_summary(position=position_dodge2(.2)) +
    geom_smooth(method='lm', se=F)

dif = make_diff(pairs)
p2 = ggplot(dif, aes(pair, diff)) + 
    stat_summary() + 
    labs(y="log(hard RT) - log(easy RT)") +
    geom_smooth(method='lm') + 
    geom_hline(yintercept=0)

p1 | p2

dif %>% with(lm(diff ~ pair)) %>% summ(scale=TRUE)
Est. S.E. t val. p
(Intercept) 0.093 0.023 4.041 0.000
pair 0.055 0.023 2.405 0.017
Standard errors: OLS; Continuous predictors are mean-centered and scaled by 1 s.d.

There’s a clear difference between the easy (fast 2AFC RT) and hard (slow 2AFC RT) trials. And the effect grows with the extremity of the difference (pair 10 has the slowest and fastest words).

Other scoring approaches

try_strategy = function(filt, score) {
    make_pairs({{filt}}, {{score}}) %>% make_diff %>% lm(diff ~ pair, data=.) %>% summ(scale=TRUE)
}
# Exclude first round, sort by log(rt) only
try_strategy(round > 1, log(rt))
Est. S.E. t val. p
(Intercept) 0.093 0.023 4.041 0.000
pair 0.055 0.023 2.405 0.017
Standard errors: OLS; Continuous predictors are mean-centered and scaled by 1 s.d.
# No exclusion
try_strategy(TRUE, log(rt))
Est. S.E. t val. p
(Intercept) 0.103 0.023 4.467 0.000
pair 0.040 0.023 1.713 0.087
Standard errors: OLS; Continuous predictors are mean-centered and scaled by 1 s.d.
# Sort by accuracy, then log(rt)
try_strategy(round > 1, 1000*(1-correct) + log(rt))
Est. S.E. t val. p
(Intercept) 0.075 0.024 3.189 0.002
pair 0.054 0.024 2.289 0.022
Standard errors: OLS; Continuous predictors are mean-centered and scaled by 1 s.d.

Other junk

Below is a bunch of other stuff we tried that isn’t really worth looking at.

AFC accuracy

ggplot(trials, aes(afc_accuracy, log_recall_rt)) + 
    stat_summary()

trials %>% with(lm(log_recall_rt ~ afc_accuracy)) %>% summ
Est. S.E. t val. p
(Intercept) 7.780 0.122 63.561 0.000
afc_accuracy -0.371 0.126 -2.935 0.003
Standard errors: OLS

This looks promising, but it turns out this one is driven entirely by individual differences (each line is a participant). Mixed effects regression shows no effect of 2AFC trials, especially when we account for RT on the correct 2AFC trials.

trials %>%
    group_by(wid, afc_accuracy) %>% 
    summarise(log_recall_rt=mean(log_recall_rt)) %>% 
    ggplot(aes(afc_accuracy, log_recall_rt, group=wid)) + 
        geom_line() + theme(legend.position = "none")

trials %>% with(lmer(log_recall_rt ~ afc_accuracy + (1|wid))) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 7.450 0.114 65.612 917.058 0.000
afc_accuracy -0.022 0.111 -0.195 1014.138 0.845
p values calculated using Satterthwaite d.f.
trials %>% with(lmer(log_recall_rt ~ afc_accuracy + log_afc_rt + (1|wid))) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 5.504 0.304 18.104 771.007 0.000
afc_accuracy -0.013 0.109 -0.119 1019.314 0.906
log_afc_rt 0.272 0.039 6.914 698.264 0.000
p values calculated using Satterthwaite d.f.

Time course

ggplot(afc, aes(round, log_afc_rt)) + 
    stat_summary()

ggplot(afc, aes(round, as.numeric(correct))) + 
    stat_summary()

2AFC RT goes down, as we would expect.

df = afc %>%
    select(wid, word, round, log_afc_rt) %>%
    pivot_wider(names_from=round, values_from=log_afc_rt, names_prefix='log_afc_rt') %>% 
    inner_join(select(trials, wid, word, log_recall_rt))

lmer(log_recall_rt ~  log_afc_rt1 + log_afc_rt2 + log_afc_rt3 + (1|wid), data=df) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 5.021 0.353 14.231 521.984 0.000
log_afc_rt1 0.099 0.030 3.322 960.106 0.001
log_afc_rt2 0.078 0.039 2.012 928.945 0.045
log_afc_rt3 0.164 0.042 3.933 970.067 0.000
p values calculated using Satterthwaite d.f.

The slope appears to be constant over rounds.

select(df, starts_with('log')) %>% cor(use="pairwise.complete.obs") %>% kable
lo g_afc_rt1 lo g_afc_rt2 lo g_afc_rt3 lo g_recall_rt
log_afc_rt1 1.0000000 0.3632215 0.3282305 0.2645305
log_afc_rt2 0.3632215 1.0000000 0.6751765 0.2876823
log_afc_rt3 0.3282305 0.6751765 1.0000000 0.3090956
log_recall_rt 0.2645305 0.2876823 0.3090956 1.0000000

But the correlation is higher in rounds 2 and 3. (Look at the last column in the correlation matrix)

Word type effects

Accuracy

All participants (before exclusion)

all_trials %>% group_by(wid) %>% summarise(accuracy=mean(correct)) %>%
    ggplot(aes(accuracy)) + 
        geom_histogram(binwidth=.1)

ggplot(all_trials, aes(fill=response_type, x=word_type)) +
     geom_histogram(stat="count") + response_type_colors

Good participants (after exclusion)

ggplot(trials, aes(fill=response_type, x=word_type)) +
     geom_histogram(stat="count") + response_type_colors

glmer(correct ~ word_type + (1|wid), data=trials, family=binomial) %>% summ
Fixed Effects
Est. S.E. z val. p
(Intercept) 2.251 0.207 10.877 0.000
word_typeHigh 0.851 0.233 3.659 0.000

Reaction time

Aggregate

ggplot(trials, aes(x=response_type, y=rt, color=word_type)) + 
    stat_summary(fun.data=mean_se, geom="pointrange", position = position_dodge(width = 0.1)) + 
    theme(legend.position=c(0.2, 0.9)) + word_type_colors

trials %>% group_by(response_type, word_type) %>% summarise(rt = mean(rt)) %>% kable
response_type word_type rt
correct Low 1773.159
correct High 1791.512
intrusion Low 2522.400
intrusion High 3145.500
other Low 3214.108
other High 2686.077
timeout Low NA
timeout High NA
empty Low NA
empty High NA

By participant

grouped = trials %>% 
    group_by(wid, word_type, response_type) %>% 
    summarise(across(where(is.numeric), mean)) %>% 
    ungroup()

ggplot(grouped, aes(x=word_type, y=rt, color=response_type, group=response_type)) + 
    geom_line(alpha=0.3, size=1, aes(group=interaction(response_type, wid))) +
    stat_summary(fun.data=mean_se, geom="pointrange") + 
    stat_summary(fun.data=mean_se, geom="line", size=1) + 
    facet_wrap(~response_type) +
    response_type_colors + theme(legend.position="none")

ggplot(filter(grouped, response_type == "correct"), aes(x=word_type, y=rt, color=response_type, group=response_type)) + 
    geom_line(alpha=0.3, size=1, aes(group=interaction(response_type, wid))) +
    stat_summary(fun.data=mean_se, geom="pointrange") + 
    stat_summary(fun.data=mean_se, geom="line", size=1) + 
    response_type_colors + theme(legend.position="none")

prop = grouped %>%
    filter(response_type == "correct") %>%
    group_by(wid, word_type) %>%
    summarise(rt=mean(rt)) %>% 
    pivot_wider(names_from=word_type, values_from=rt) %>% 
    with(round(mean(Low > High) * 100))
## prop

50% of participants respond faster on high memorability trials (including only correct response trials).

Stats

lmer(rt ~ word_type + (1|wid), data=trials) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1978.017 116.059 17.043 62.722 0.000
word_typeHigh -105.688 60.931 -1.735 986.733 0.083
p values calculated using Satterthwaite d.f.
lmer(rt ~ word_type * response_type + (1|wid), data=trials) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1845.979 107.221 17.217 64.578 0.000
word_typeHigh -29.363 58.612 -0.501 980.373 0.617
response_typeintrusion 357.931 300.391 1.192 991.072 0.234
response_typeother 1025.651 163.540 6.272 997.151 0.000
response_typetimeout 4330.633 473.978 9.137 993.649 0.000
response_typeempty 4279.060 657.436 6.509 986.949 0.000
word_typeHigh:response_typeintrusion 1129.334 419.660 2.691 987.310 0.007
word_typeHigh:response_typeother -611.417 306.676 -1.994 988.702 0.046
word_typeHigh:response_typetimeout -2620.306 619.621 -4.229 984.410 0.000
p values calculated using Satterthwaite d.f.
lmer(rt ~ word_type + (1|wid), data=subset(trials, correct)) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 1847.276 108.505 17.025 60.390 0.000
word_typeHigh -29.098 52.057 -0.559 905.330 0.576
p values calculated using Satterthwaite d.f.
lmer(log_recall_rt ~ word_type + (1|wid), data=subset(trials, correct)) %>% summ
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) 7.400 0.040 184.765 62.416 0.000
word_typeHigh -0.004 0.022 -0.192 905.637 0.848
p values calculated using Satterthwaite d.f.

Typing time

ggplot(trials, aes(x=response_type, y=type_time, color=word_type)) + 
    stat_summary(fun.data=mean_se, geom="pointrange", position = position_dodge(width = 0.1)) + 
    theme(legend.position=c(0.2, 0.9)) + word_type_colors

trials %>% group_by(response_type, word_type) %>% summarise(type_time = mean(type_time)) %>% kable
response_type word_type type_time
correct Low 2883.395
correct High 2961.577
intrusion Low 3396.080
intrusion High 3833.059
other Low 4958.709
other High 4656.375
timeout Low NA
timeout High NA
empty Low 6605.913
empty High 4320.307
lm(type_time ~ word_type, data=subset(trials, correct)) %>% summ
Est. S.E. t val. p
(Intercept) 2883.395 66.867 43.121 0.000
word_typeHigh 78.182 92.984 0.841 0.401
Standard errors: OLS

Total response time

ggplot(trials, aes(x=response_type, y=rt+type_time, color=word_type)) + 
    stat_summary(fun.data=mean_se, geom="pointrange", position = position_dodge(width = 0.1)) + 
    theme(legend.position=c(0.2, 0.9)) + word_type_colors

trials %>% group_by(response_type, word_type) %>% summarise(total_time = mean(rt+type_time)) %>% kable
response_type word_type total_time
correct Low 4656.554
correct High 4753.089
intrusion Low 5918.480
intrusion High 6978.559
other Low 8172.817
other High 7342.452
timeout Low NA
timeout High NA
empty Low NA
empty High NA
lm(rt + type_time ~ word_type, data=subset(trials, correct)) %>% summ
Est. S.E. t val. p
(Intercept) 4656.554 109.600 42.487 0.000
word_typeHigh 96.535 152.408 0.633 0.527
Standard errors: OLS

Check response coding

Responses classified as “other” (incorrect but not intrusion)

trials %>% filter(word == response)  %>% with(all(correct)) %>% stopifnot
trials %>% filter(response_type == "other") %>% select(word, response) %>% kable
word response
dandelion daffodil
tread mark
quill feather
vagrant homeless
keeper guardian
keeper eqw
barracuda chapagie
grizzly fefs
tread e
seagull fd
personnel partner
goddess superb
ozone ozern
vagrant vapous
traitor trader
villain villian
fellow tranquil
stairs staircases
finger leg
house home
frog forg
rebel rag
tread trail
tread trape
roost roote
critic civic]
captive impact
rouge rogue
villain killer
patient secrets
handkerchief hankerchief
tart idk
personnel personelle
patient vacation
penguin airport
application applicant
suburb suburban
personnel personell
keeper partner
patient patience
rouge terrain
roost vacation
captive captian
antelope envelop
rouge rogue
cookie cookies
grizzly quikky
editorial literature
roost rooster
daisy target

Non-exact matches classified as correct

trials %>% filter(correct & word != response) %>% select(word, response) %>% kable
word response
goddess goddness
villain villai
volunteer volunteee
barracuda baraccuda
chimney chimeny
barracuda baracuda
shoelace shoelaces
seagull seagul
horse horses
suburb suburbs
tulip tulips
critic critc
bug bugs
barracuda barricuda
scapegoat scapegaot
refrigerator refridgerator
handkerchief handerchief
jeans jean
flower flowers
jeans jean
banana bananna
cauliflower califlower
eagle eagles
suburb subrub
tomato tomatoes
handkerchief handkerchied
suburb suburbs